Building and plotting a Least-Squares Regression Line by hand

Import our needed modules

Lets load that in now.

We can definitely see a relationship here with these two varibales, so this should serve us well as some test data when looking at correlation and regression.

Since we are not trying to predict anything about the classes for now, we will just use the features provided to explore ideas

Correlation

$\Large{r = \frac{1}{n-1} \sum_{i=1}^{n}(\frac{x_i - \overline{x} }{s_x})(\frac{y_i - \overline{y}}{s_y})}$

Lets quickly verify this function agaisnt scipy

Now that we've sufficiently re-invented the wheel - we can move on..

From this r coef of 0.963, we see that petal width and petal length have a pretty strong positive correlation to each other.

Least Squares Regression Line Equation

$\Large \hat{y} = b_0 + b_1x$

where

$\Large b_1 = r\frac{s_y}{s_x} $

and

$\Large b_0 = \overline{y} - b_1\overline{x} $

Now that we have a nice compact, printable object. We can create our function that uses the formula above to calculate the equation for the line.

Lets call our function to get the values and the regression line.

And now plot our values and line to see how we did

The least-squares line we created here from the formulas looks pretty accurate.

For now this is where we will end. Keeping in mind that using libraries for this is much more practical, less time consuming, and just overall makes more sense. For review of formulas and concepts and even just general programming practice however, your own implementations can have benefit.